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Software Patent Institute (SPh (Select "Free Access") 
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SPIE Digital Library 

(journals and proceedings on optics and photonics) 
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(A resurrected version of the old "Computer Select" database, providing full text access to over 
100 technology focused publications, a glossary of technical terms, product reviews and over 
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click browser's "Reload" or "Refresh" button.) 

Books and Journals 

et> Search STIC Online Catalog 
InfoSECURITYnetBASE 
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NetLibrarv.com 

(Multidisciplinary subject coverage) 
Safari Online Books 

(Computer and information technology) 
Springer Publishing Company 

(biotech, physics, and computer journals) 

Daily Newspapers 

Fulltext newspaper articles are available electronically in Proguest Direct . 

CD-ROM Re s ources 

Older full text NPL resources/articles received in CD-Rom format. These resources are 
available on EIC21 00 PCs in CPK2, 4B40. 

Equipment 

Fax (571-273-0044). 
Optical Scanners 

- Use OmniPage Pro software to scan your documents. 
Power Mac G3 
Photocopier 

Reference Tools 
Bartlebv.com 

(Several versions of Rogers Thesaurus, a dictionary, an encyclopedia, quotations, English usage 

books and more.) 
Computer References 

(Dictionaries, Acronyms Finders, Encyclopedias) 
Efunda 

(30,000 pages of engineering fundamentals and calculators) 
Encyclopedia Britannica 
Encyclopedia of Software Engineering 
Eric Weisstein's World of Mathematics 

(A comprehensive online encyclopedia of mathematics.) 
HowStuffWorks 
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(Search a term to find articles that explain how it works.) 
The Internet Encyclopedia 
Over 2000 Glossary Links 

(Links to numerous technical, specialty, and general glossaries.) 
PCWebopedia 

Wiley Encyclopedia of Electrical and Electronics Engineering 

Xreferplus 

Yourdictionarv.com 

(Numerous "specialty dictionaries"... technological, law, business related and more.) 

Services 

EIC2100 Staff 

Foreign Patent Services 

PLUS 

Request a PLUS Search 

[IFW case] [Paper case] 
Reguest a Book/Journal Purchase 
Request a Book or Article 
Request a Foreign Patent Publication 

[e-submit] [Printable form] 
Request a Search 

[e-submit] [Printable form] 

Fast & Focused Search Criteria 
STIC Online Catalog 
Translation Services 



Web Resources 

A Brief History of the Hard Disk Drive 

'<> CiteSeer (Researchlndex) 

(Full text scientific research papers - in pdfand postscript formats.) 
Interfacebus.com 

(Listing of Electronic Interface Buses with links to standards and specifications.) 
internet Engineering Task Force 

(The IETF Secretariat, run by The Corporation for National Research Initiatives with funding from 

the US government, maintains an index of Internet-Drafts.) 
Nanotechnoloqy 

PCI Specifications (username: uspto; password: pat222) 

("Peripheral Component Interconnect" specifications and white papers.) 

Requests for Comments (RFCs) Database 

(Requests for Comments (RFC) document series is a set of technical and organizational 
notes about the Internet (originally the ARPANET), beginning in 1969 and discussing many 
aspects of computer networking, including protocols, procedures and concepts as well as 
meeting notes and opinions.) 

Scirus 

•!> Usenet Archive (Google Groups) 
«t> Wavback Machine 



If you cannot access some files because of a missing or non-working plug-in for 
PDFs or Word Documents, please contact the Help Desk at 305-9000 for installation assistance. 

Intranet Home | Index | Resources | Contacts | Internet [ Search | Firewall | Web Services 

Last Modified: 01/12/2005 1 1:27:40 



(Archived web pages.) 
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1 Computin g applications III: medical: An analysis of the job market for biomedical 
computer scientists 

Fred R. Sias 

April 1976 Proceedings of the 14th annual Southeast regional conference 

Full text available: ^ pdf(367.74 KB) Additional Information: full citation , abstract , references , citings 

Biomedical Information and Computer Science is an academic area that has received much 
interest recently. A number of training programs have been developed around the country. 
This paper is an examination of the potential market for biomedical information and 
computer scientists.lt is possible to identify a number of organizations that may potentially 
employ biomedical computer scientists. Included in such a list are medical schools, hospitals 
above a certain size, software houses, health mainte ... 

2 Decision trees with minimal costs 

Charles X. Ling, Qiang Yang, Jianning Wang, Shichao Zhang 

July 2004 Twenty-first international conference on Machine learning 

Full text available: ^ pdf(306.59 KB) Additional Information: full citation , abstract , references 

We propose a simple, novel and yet effective method for building and testing decision trees 
that minimizes-the sum of the misclassification and test costs. More specifically, we first put 
forward an original and simple splitting criterion for attribute selection in tree building. Our 
tree-building algorithm has many desirable properties for a cost-sensitive learning system 
that must account for both types of costs. Then, assuming that the test cases may have a 
large number of missing values, we ... 



3 6-2 VRC in simulation & training: Multidimensional volume visualization for PC-based Q 
microsurgical simulation system 

Zhenlan Wang, Chee-Kong Chui, Yiyu Cai, Chuan-Heng Ang 

June 2004 Proceedings of the 2004 ACM SIGGRAPH international conference on Virtual 
Reality continuum and its applications in industry 

Full text available: Q pdf(446.38 KB) Additional Information: full citation , abstract , references , index terms 

Microsurgery is a highly complex surgical procedure on small body parts performed by a 
dedicated surgical team. An operating microscope is typically used to obtain a precise view 
of the soft tissues. The complexity of the microsurgical procedure makes it a suitable 
application of virtual/augmented reality technology for training purpose. In this paper, we 
present an overview of our simulator and then describe the visualization work that 
reconstructs the magnified view of the operating area from ... 
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Learning methods to combine linguistic indicators: improving aspectual classification 
and revealing linguistic insights 
Eric V. Siegel, Kathleen R. McKeown 

December 2000 Computational Linguistics, volume 26 issue 4 

Full text available:^ ffjj] 

pelf (1.96 MB) Additional Information: full citation , abstract , references 
Publisher Site 

Aspectual classification maps verbs to a small set of primitive categories in order to reason 
about time. This classification is necessary for interpreting temporal modifiers and assessing 
temporal relationships, and is therefore a required component for many natural language 
applications.A verb's aspectual category can be predicted by co-occurrence frequencies 
between the verb and certain linguistic modifiers. These frequency measures, called 
linguistic indicators, are chosen by linguistic insi ... 

Special issue on learning from imbalanced datasets: Minority report in fraud detection: 

classification of skewed data 

Clifton Phua, Damminda Alahakoon, Vincent Lee 

June 2004 ACM SIGKDD Explorations Newsletter, volume 6 issue l 

Full text available: ^ pdf( 262.38 KB) Additional Information: full citation , abstract , references , citings 

This paper proposes an innovative fraud detection method, built upon existing fraud 
detection research and Minority Report, to deal with the data mining problem of skewed 
data distributions. This method uses backpropagation (BP), together with naive Bayesian 
(NB) and C4.5 algorithms, on data partitions derived from minority oversampling with 
replacement. Its originality lies in the use of a single meta-classifier (stacking) to choose the 
best base classifiers, and then combine these base ... 

Keywords: fraud detection, meta-learning, multiple classifier systems 



6 Cost/benefit based adaptive dialog: case study using empirical medical practice norms 
and intelligent split menus 
Jim Warren 

January 2001 Australian Computer Science Communications , Proceedings of the 2nd 

Australasian conference on User interface, volume 23 issue 5 
Full text available: J pdf(843.80 KB) Addit ional Information: full citation , abstract , references , citings , index 
f P Publisher Site terms, review 

The notion of an adaptive user interface, one that accommodates user needs based on 
knowledge of the task at hand, is compelling but difficult to make practical. This paper 
examines models of the utility (as balancing of cost and benefit) in the initiation of task- 
specific dialog based on conditional probability of user goals in context. Illustrations in this 
paper are based on an empirical model of General Practice (GP) medicine as derived from a 
large database of GP/patient encounters. Applica ... 
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Examining alternative End-Stage Renal Disease (ESRD) therapies through simulation 

Stephen D. Roberts, Thomas L Gross, Douglas R. Maxwell 

March 1979 Proceedings of the 12th annual symposium on Simulation 

Full text available: ^ pdf(1.Q4 MB) Additional Information: full citation , abstract , references , index terms 
To examine the costs and effects of alternative treatments for End-Stage Renal Disease 
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(ESRD), we constructed a simulation model which estimates patient survival and lifetime 
cost for those with ESRD. The model, written in the INS simulation language, considers 
home and center hemodialysis as well as live related and cadaver donor transplantation. 
After the model was validated, the cost per life year gained for each therapy was computed. 
Center hemodialysis was found to have the poorest cos ... 



8 98^/Mflops/s ultra-large-scale neural-network training on a pill cluster 
Douglas A. Aberdeen, Jonathan Baxter, Robert Edwards 

November 2000 Proceedings of the 2000 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: 1|jpdf(215.33 KB ) Additional Information: full citation , abstract , references , citings , index 
I P Publisher Site 

Artificial neural networks with millions of adjustable parameters and a similar number of 
training examples are a potential solution for difficult, large-scale pattern recognition 
problems in areas such as speech and face recognition, classification of large volumes of 
web data and finance. The bottleneck is that neural network training involves iterative 
gradient descent and is extremely computationally intensive. In this paper we present a 
technique for distributed training of Ultra Large ... 



Keywords: neural-network, Linux cluster, matrix-multiply 



9 May I interrupt?: Busv Bod v: creatin g and fielding personalized models of the cost of 
interruption 

Eric Horvitz, Paul Koch, Johnson Apacible 

November 2004 Proceedings of the 2004 ACM conference on Computer supported 
cooperative work 

Full text available: ^ pdf(1 45,02 KB) Additional Information: full citation , abstract , references , index terms 

Interest has been growing in opportunities to build and deploy statistical models that can 
infer a computer user f s current interruptability from computer activity and relevant 
contextual information. We describe a system that intermittently asks users to assess their 
perceived interruptability during a training phase and that builds decision-theoretic models 
with the ability to predict the cost of interrupting the user. The models are used at run-time 
to compute the expected cost of interrupt ... 

Keywords: cost of interruption, models of attention, notification systems 



Man-machine communications in the biological-medical research environment 
William E. Farley, Alfred H. Pulido, Tate M. Minckler, Lee D. Cady 
January 1966 Proceedings of the 1966 21st national conference 

Full text available: Q p df (297.42 KB) Additional Information: full citation , abstract , references , index terms 

The key source of raw data in most biomedical research is the patients medical record. The 
hospital patient medical record is most commonly thought of as the repository for all 
pertinent facts relating to laboratory test results, diagnostic conclusions, treatment 
procedures, and observations. Depending on the nature of the patients complaint, there are 
varying amounts of medical history information incorporated into the record. In many 
instances, the compilation of the record has become s ... 




On becoming virtual: the driving forces and arrangements 
Magid Igbaria, Conrad Shayo, Lome Olfman 

April 1999 Proceedings of the 1999 ACM SIGCPR conference on Computer personnel 
research 
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Full text available: 1|f| pdf(1.80 MB) Additional Information: full citation , references , index terms 



Keywords: telework, virtual communities, virtual organizations, virtual society, virtual 
teams 



12 Special issue on learning from imbalanced datasets: Mining with rarity: a unifying 
. framework 
Gary M. Weiss 

June 2004 ACM SIGKDD Explorations Newsletter volume 6 issue l 

Full text available: ||| pdf(1 82.31 KB) Additional Information: full citation , abstract , references , citings 

Rare objects are often of great interest and great value. Until recently, however, rarity has 
not received much attention in the context of data mining. Now, as increasingly complex 
real-world problems are addressed, rarity, and the related problem of imbalanced data, are 
taking center stage. This article discusses the role that rare classes and rare cases play in 
data mining. The problems that can result from these two forms of rarity are described in 
detail, as are methods for addressing these ... 

Keywords: class imbalance, cost-sensitive learning, inductive bias, rare cases, rare classes, 
sampling, small disjuncts 



13 Turmoil at NASA, and numerous funding announcements 
Xiaolei Qian 

September 1995 ACM SIGMOD Record, volume 24 issue 3 

Full text available: ^ pdf(1 15.95 KB) Additional Information: full citation , abstract , index terms 

Since the last issue of this column six months ago, there have been many interesting 
program announcements, some of which have already passed deadline. We'll go over these 
announcements anyway, with the hope that they can get the readers better prepared for 
future funding opportunities. But first, we'll talk about the continuing budget battle at 
Congress, and the recent turmoil at NASA. 

14 Computer aides to medical diagnosis — problems and progress 
Stephen R. Yarnall, Richard A. Kronmal 

July 1966 Communications of the ACM, Volume 9 issue i 

Full text available: jg?) pdf(654.43 A ..... , „ .. 4 . 

*=*r~^ Additional Information: full citation 

KB) 



15 Man/machine communications in the biological medical research environment 
W. E. Farley, A. H. Pulido, T. M. Minckler, L. D. Cady 
July 1966 Communications of the ACM, volume 9 issue i 

Full text available: 1S| pdf (654.43 ...... IIX x „ A . 

i^j-tu — Additional Information: full citation 

KB) 



16 Engineering, medical and scientific applications 
G. H. Kuby 

July 1966 Communications of the ACM, volume 9 issue 7 
Full text available: 



http://portal.acm.org/resul^ 2/16/05 



Results (page 1): ((+"medical cost") and (^"training data")) 



Page 5 of 5 



3df(654.43 Additional Information: full citation 

KB) 



17 META5: A tool to manipulate strings of data 
David K. Oppenheim, Dan P. Haggerty 
July 1966 Communications of the ACM, volume 9 issue 7 

Full text available: ffl pdf(654.43 ...... , „ , 4 A . 

K * Additional Information: full citation 

KB) 



18 A real-time error correcting data transmission system treated as a Markov process 
Frank T. Kuhn 

July 1966 Communications of the ACM, Volume 9 Issue 7 
Full text available: g pdf(654.43 KB) Additional Information: full citation 



19 Lunar orbiter command and telemetry data handling system (CTDH) at deep space 
stations 



I. Holgersen, E. Knutson, D. R. Merrill 

July 1966 Communications of the ACM, volume 9 issue i 

Full text available: g pdf(654.43 KB) Additional Information: full citation 



20 A special purpose multiprogramming system for a computer-controlled telemetry data 
reduction system 
Harold R. Gillette 

July 1966 Communications of the ACM, volume 9 issue 7 
Full text available: ^ pdf(654.43 KB) Additional Information: full citation 
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Special issue on learning from imbalanced datasets: A study of the behavior of several 
methods for balancing machine learning training data 
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, Maria Carolina Monard 
June 2004 ACM SIGKDD Explorations Newsletter, volume 6 issue i 

Full text available: ^ pdf(314.77 KB) Additional Information: full citation , abstract , references , citings 

There are several aspects that might influence the performance achieved by existing 
learning systems. It has been reported that one of these aspects is related to class 
imbalance in which examples in training data belonging to one class heavily outnumber the 
examples in the other class. In this situation, which is found in real world data describing an 
infrequent but important event, the learning system may have difficulties to learn the 
concept related to the minority class. In this work we per ... 

Cost/benefit based adaptive dialog: case study using empirical medical practice norms 
and intelligent split menus 
Jim Warren 

January 2001 Australian Computer Science Communications , Proceedings of the 2nd 

Australasian conference on User interface, volume 23 issue 5 
Full text available: J|, pdf (843.80 KB ) Additional Information: full citation , abstract , references , citings , index 
H P Publisher Site terms, review 

The notion of an adaptive user interface, one that accommodates user needs based on 
knowledge of the task at hand, is compelling but difficult to make practical. This paper 
examines models of the utility (as balancing of cost and benefit) in the initiation of task- 
specific dialog based on conditional probability of user goals in context. Illustrations in this 
paper are based on an empirical model of General Practice (GP) medicine as derived from a 
large database of GP/patient encounters. Applica ... 

Special section on data mining for intrusion detection and threat analysis: Data mining- 
based intrusion detectors: an overview of the Columbia IDS project 
Salvatore J. Stolfo, Wenke Lee, Philip K. Chan, Wei Fan, Eleazar Eskin 
December 2001 ACM SIGMOD Record, volume 30 issue 4 

Full text available: ^ pdfd.05 MB) Additional Information: full citation , references , citings, index terms 



S pecial issue on learning from imbalanced datasets: Minority report in fraud detection: 
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classification of skewed data 

Clifton Phua, Damminda Alahakoon, Vincent Lee 

June 2004 ACM SIGKDD Explorations Newsletter volume 6 issue l 

Full text available: |g pdf(262.38 KB) Additional Information: full citation , abstract , references , citings 

This paper proposes an innovative fraud detection method, built upon existing fraud 
detection research and Minority Report, to deal with the data mining problem of skewed 
data distributions. This method uses backpropagation (BP), together with naive Bayesian 
(NB) and C4.5 algorithms, on data partitions derived from minority oversampling with 
replacement. Its originality lies in the use of a single meta-classifier (stacking) to choose the 
best base classifiers, and then combine these base ... 

Keywords: fraud detection, meta-learning, multiple classifier systems 



5 Image and video digital libraries: Semantic video classification and feature subset 
selection under context and concept uncertainty 
Jianping Fan, Hangzai Luo, Jing Xiao, Lide Wu 
June 2004 

Full text available: ^ pdf(258.04 KB) Additional Information: full citation , abstract , references , index terms 

As large collections of videos become one key component of digital libraries, there is an 
urgent need of semantic video classification and feature subset selection to enable more 
effective video database organization and retrieval. However, most existing techniques for 
classifier training require a large number of labeled samples to learn correctly and suffer 
from the problems of context and concept uncertainty when only a limited number of labeled 
samples are available. To address the problems ... 

Keywords: adaptive EM algorithm, context and concept uncertainty, semantic video 
classification, unlabeled samples 



A Bayesian decision model for cost optimal record matching 
V. S. Verykios, G. V. Moustakides, M. G. Elfeky 

May 2003 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 12 Issue 1 

Full text available: ^pdfd 80.87 KB) Additional Information: full citation , abstract , index terms 

In an error-free system with perfectly clean data, the construction of a global view of the 
data consists of linking - in relational terms, joining - two or more tables on their key fields. 
Unfortunately, most of the time, these data are neither carefully controlled for quality nor 
necessarily defined commonly across different data sources. As a result, the creation of such 
a global data view resorts to approximate joins. In this paper, an optimal solution is 
proposed for the matching or the lin ... 

Keywords: Cost optimal statistical model, Data cleaning, Record linkage 



Rule-based machine learning of spatial data concepts 
Steve Stearns, Daniel C. St. Clair 

February 1995 Proceedings of the 1995 ACM symposium on Applied computing 

Full text available: ^ pdf(730.02 KB) Additional Information: full citation , references , index terms 



Keywords: AQ15, classification, expert systems, geographic information systems, machine 
learning 
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8 Special issue on learning from imbalanced datasets: Mining with rarity: a unifying 
framework 
Gary M. Weiss 

June 2004 ACM SIGKDD Explorations Newsletter volume 6 issue l 

Full text available: ^ pdf(182.31 KB) Additional Information: full citation , abstract , references , citings 

Rare objects are often of great interest and great value. Until recently, however, rarity has 
not received much attention in the context of data mining. Now, as increasingly complex 
real-world problems are addressed, rarity, and the related problem of imbalanced data, are 
taking center stage. This article discusses the role that rare classes and rare cases play in 
data mining. The problems that can result from these two forms of rarity are described in 
detail, as are methods for addressing these ... 

Keywords: class imbalance, cost-sensitive learning, inductive bias, rare cases, rare classes, 
sampling, small disjuncts 



9 Data transformation and duplicate detection: A generalized cost optimal decision 
model for record matchin g 
Vassilios S. Verykios, George V. Moustakides 

June 2004 Proceedings of the 2004 international workshop on Information quality in 
information systems 

Full text available: |B| pdf(1 18.81 KB) Additional Information: full citation , abstract , references 

Record (or entity) matching or linkage is the process of identifying records in one or more 
data sources, that refer to the same real world entity or object. In record linkage, the 
ultimate goal of a decision model is to provide the decision maker with a tool for making 
decisions upon the actual matching status of a pair of records (i.e., documents, events, 
persons, cases, etc.). Existing models of record linkage rely on decision rules that minimize 
the probability of subjecting a case to cleric ... 

Keywords: probabilistic decision model, record matching 




10 Special issue on the fusion of domain knowledge with data for decision support: 
Preference elicitation via theory refinement 

Peter Haddawy, Vu Ha, Angelo Restificar, Benjamin Geisler, John Miyamoto 
December 2003 The Journal of Machine Learning Research, volume 4 

Full text available: fSl pdfn 50.88 KB) Additiona l Information: full citation , abstract , references , citings, index 

terms 

We present an approach to elicitation of user preference models in which assumptions can 
be used to guide but not constrain the elicitation process. We demonstrate that when 
domain knowledge is available, even in the form of weak and somewhat inaccurate 
assumptions, significantly less data is required to build an accurate model of user 
preferences than when no domain knowledge is provided. This approach is based on the 
KBANN (Knowledge-Based Artificial Neural Network) algorithm pioneered by Shav ... 

11 Automated learnin g of decision rules for text categorization 
Chidanand Apte, Fred Damerau, Sholom M. Weiss 

July 1994 ACM Transactions on Information Systems (TOIS), Volume 12 Issue 3 

Full text available: tg|pdf(1.28 MB) Additional Information: full citation , abstract , references , citings , index 

terms , review 
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We describe the results of extensive experiments using optimized rule-based induction 
methods on large document collections. The goal of these methods is to discover 
automatically classification patterns that can be used for general document categorization or 
personalized filtering of free text. Previous reports indicate that human-engineered rule- 
based systems, requiring many man-years of developmental efforts, have been successfully 
built to "read" documents and assign topics ... 

12 A hierarchical access control model for video database systems 

Elisa Bertino, Jianping Fan, Elena Ferrari, Mohand-Said Hacid, Ahmed K. Elmagarmid, 
Xingquan Zhu 

April 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 2 

Full text available: ^ pdf(6.27 MB) Additional Information: full citation , abstract , references , index terms 

Content-based video database access control is becoming very important, but it depends on 
the progresses of the following related research issues: (a) efficient video analysis for 
supporting semantic visual concept representation; (b) effective video database indexing 
structure; (c) the development of suitable video database models; and (d) the development 
of access control models tailored to the characteristics of video data. In this paper, we 
propose a novel approach to support multilevel acce ... 

Keywords: Video database models, access control, indexing schemes 

13 QProber: A system for automatic classification of hidden-Web databases 
Luis Gravano, Panagiotis G. Ipeirotis, Mehran Sahami 

January 2003 ACM Transactions on Information Systems (TOIS), Volume 21 Issue 1 
Full text available: ^ pdf(3.62 MB) Additional Information: full citation , abstract , references , index terms 

The contents of many valuable Web-accessible databases are only available through search 
interfaces and are hence invisible to traditional Web "crawlers." Recently, commercial Web 
sites have started to manually organize Web-accessible databases into YahooHike 
hierarchical classification schemes. Here we introduce QProber, a modular system that 
automates this classification process by using a small number of query probes, generated by 
document classifiers. QProber can use a variety of types of ... 

Keywords: Database classification, Web databases, hidden Web 

14 A review of vessel extraction techniques and algorithms | 
Cemil Kirbas, Francis Quek 

June 2004 ACM Computing Surveys (CSUR), volume 36 issue 2 

Full text available: ^ pdf(8.06 MB) Additional Information: full citation , abstract , references , index terms 

Vessel segmentation algorithms are the critical components of circulatory blood vessel 
analysis systems. We present a survey of vessel extraction techniques and algorithms. We 
put the various vessel extraction approaches and techniques in perspective by means of a 
classification of the existing research. While we have mainly targeted the extraction of blood 
vessels, neurosvascular structure in particular, we have also reviewed some of the 
segmentation methods for the tubular objects that show ... 

Keywords: Magnetic resonance angiography, X-ray angiography, medical imaging, 
neurovascular, vessel extraction 

15 Improving SVM accuracy by training on auxiliary data sources | 
Pengcheng Wu, Thomas G. Dietterich 
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July 2004 Twenty-first international conference on Machine learning 

Full text available: ^ pdf(263.65 KB) Additional Information: full citation , abstract , references 

The standard model of supervised learning assumes that training and test data are drawn 
from the same underlying distribution. This paper explores an application in which a second, 
auxiliary, source of data is available drawn from a different distribution. This auxiliary data is 
more plentiful, but of significantly lower quality, than the training and test data. In the SVM 
framework, a training example has two roles: (a) as a data point to constrain the learning 
process and (b) as a candidate su ... 



16 Applications of machine learning and rule induction 
Pat Langley, Herbert A. Simon 

November 1995 Communications of the ACM, volume 38 issue n 

Full text available* 1S| pdf (554.28 KB) Additiona l Information: full citation , abstract , references , citings , index 

terms , review 

Machine learning is the study of computational methods for improving performance by 
mechanizing the acquisition of knowledge from experience. Expert performance requires 
much domain-specific knowledge, and knowledge engineering has produced hundreds of AI 
expert systems that are now used regularly in industry. Machine learning aims to provide 
increasing levels of automation in the knowledge engineering process, replacing much time- 
consuming human activity with automatic tec ... 



17 Industrial/government track: Clinical and financial outcomes analysis with existing 
hos pital patient records 

R. Bharat Rao, Sathyakama Sandilya, Radu Stefan Niculescu, Colin Germond, Harsha Rao 
August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: |S| pdf(1 88.40 KB) Additional Information: full citation , abstract , references , index terms 

Existing patient records are a valuable resource for automated outcomes analysis and 
knowledge discovery. However, key clinical data in these records is typically recorded in 
unstructured form as free text and images, and most structured clinical information is poorly 
organized. Time-consuming interpretation and analysis is required to convert these records 
into structured clinical data. Thus, only a tiny fraction of this resource is utilized. We present 
REMIND, a Bayesian Framework for Reliable ... 



Keywords: Bayes Nets, HMMs, data mining, temporal reasoning 



Fuzzy rule extraction from GIS data with a neural fuzzy system for decision making 
Ding Zheng, Wolfgang Kainz 
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Privacy of medical records: IT implications of HIPAA 

David Baumer, Julia Brande Earp, Fay Cobb Payton 
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Full text available: |S| pdf(819.71 KB) Additional Information: full citation , abstract 
Increasingly, medical records are being stored in computer databases that allow for 



http://portal.acm.org/resultsxto^ 2/16/05 



Results (page 1): (((+"medical cost") and (+" training data 11 )) and +rules) 



Page 6 of 6 



efficiencies in providing treatment and in the processing of clinical and financial services. 
Computerization of medical records has also diminished patient privacy and, in particular, 
has increased the potential for misuse, especially in the form of nonconsensual secondary 
use of personally identifiable records. Organizations that store and use medical records have 
had to establish security measures, prompted pa ... 

20 Learning methods to combine linguistic indicators: improving aspectual classification 
and revealing linguistic insights 
Eric V. Siegel, Kathleen R. McKeown 

December 2000 Computational Linguistics, volume 26 issue 4 

Full text available: ^ ffjj] 

Tjg| pdf(l.96MB) J^ Additional Information: full citation , abstract , references 
Publisher Site 

Aspectual classification maps verbs to a small set of primitive categories in order to reason 
about time. This classification is necessary for interpreting temporal modifiers and assessing 
temporal relationships, and is therefore a required component for many natural language 
applications.A verb's aspectual category can be predicted by co-occurrence frequencies 
between the verb and certain linguistic modifiers. These frequency measures, called 
linguistic indicators, are chosen by linguistic insi ... 
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1 Data clustering: a review 

A. K. Jain, M. N. Murty, P. J. Flynn 

September 1999 ACM Computing Surveys (CSUR), volume 31 issue 3 

Full text available: fi3 pdf(636 24 KB) Additiona ' Information: full citation , abstract , references , citings, index 
^ terms , review 

Clustering is the unsupervised classification of patterns (observations, data items, or feature 
vectors) into groups (clusters). The clustering problem has been addressed in many 
contexts and by researchers in many disciplines; this reflects its broad appeal and 
usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult 
problem combinatorially, and differences in assumptions and contexts in different 
communities has made the transfer of useful generic co ... 



Keywords: cluster analysis, clustering applications, exploratory data analysis, incremental 
clustering, similarity indices, unsupervised learning 



A review of vessel extraction techniques and al g orithms 
Cemil Kirbas, Francis Quek 

June 2004 ACM Computing Surveys (CSUR), volume 36 issue 2 

Full text available: ^ pdf(8.Q6 MB) Additional Information: full citation , abstract , references , index terms 

Vessel segmentation algorithms are the critical components of circulatory blood vessel 
analysis systems. We present a survey of vessel extraction techniques and algorithms. We 
put the various vessel extraction approaches and techniques in perspective by means of a 
classification of the existing research. While we have mainly targeted the extraction of blood 
vessels, neurosvascular structure in particular, we have also reviewed some of the 
segmentation methods for the tubular objects that show ... 

Keywords: Magnetic resonance angiography, X-ray angiography, medical imaging, 
neurovascular, vessel extraction 




Computational strategies for object recognition 

Paul Suetens, Pascal Fua, Andrew J. Hanson 

March 1992 ACM Computing Surveys (CSUR), Volume 24 Issue 1 
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This article reviews the available methods for automated identification of objects in digital 
images. The techniques are classified into groups according to the nature of the 
computational strategy used. Four classes are proposed: (1) the simplest strategies, which 
work on data appropriate for feature vector classification, (2) methods that match models to 
symbolic data structures for situations involving reliable data and complex models, (3) 
approaches that fit models to the photometry and ... 

Keywords: image understanding, model-based vision, object recognition 



4 Machine learning in automated text categorization 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue i 

Full text available: fiQ p<jf(524 41 KB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of 
documents in digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem is based on machine learning techniques: 
a general inductive process automatically builds a classifier by learning, from a set of 
preclassified documents, the characteristics of the categories. ... 

Keywords: Machine learning, text categorization, text classification 



5 A hierarchical access control model for video database systems 

Elisa Bertino, Jianping Fan, Elena Ferrari, Mohand-Said Hacid, Ahmed K. Elmagarmid, 
Xingquan Zhu 

April 2003 ACM Transactions on Information Systems (TOIS), Volume 21 Issue 2 
Full text available: |^ pdf(6.27 MB ) Additional Information: full citation , abstract , references , index terms 

Content-based video database access control is becoming very important, but it depends on 
the progresses of the following related research issues: (a) efficient video analysis for 
supporting semantic visual concept representation; (b) effective video database indexing 
structure; (c) the development of suitable video database models; and (d) the development 
of access control models tailored to the characteristics of video data. In this paper, we 
propose a novel approach to support multilevel acce ... 

Keywords: Video database models, access control, indexing schemes 
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Steve Stearns, Daniel C. St. Clair 

February 1995 Proceedings of the 1995 ACM symposium on Applied computing 

Full text available: ^ pdf(730.02 KB) Additional Information: full citation , references , index terms 
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learning 
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Chidanand Apte, Fred Damerau, Sholom M. Weiss 

July 1994 ACM Transactions on Information Systems (TOIS), volume 12 issue 3 

Full text available* fjQ pdf(1.28 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms , review 

We describe the results of extensive experiments using optimized rule-based induction 
methods on large document collections. The goal of these methods is to discover 
automatically classification patterns that can be used for general document categorization or 
personalized filtering of free text. Previous reports indicate that human-engineered rule- 
based systems, requiring many man-years of developmental efforts, have been successfully 
built to "read" documents and assign topics ... 



8 Model-based recognition in robot vision 
Roland T. Chin, Charles R. Dyer 

March 1986 ACM Computing Surveys (CSUR), volume 18 issue 1 

Full text available: 113 pdf (4.94 MB) Additional Information: full citation , abstract , references , citings, index 

terms , review 

This paper presents a comparative study and survey of model-based object-recognition 
algorithms for robot vision. The goal of these algorithms is to recognize the identity, 
position, and orientation of randomly oriented industrial parts. In one form this is commonly 
referred to as the "bin-picking" problem, in which the parts to be recognized are presented 
in a jumbled bin. The paper is organized according to 2-D, 2V2-D, and 3-D object 
representations, which are used as the basis for ... 

9 Computin g curricula 2001 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Full text available: f£| pdf(61 3.63 KB) AJJV lir 

[fr f html(2.78 KB) Additional Information: full citation , references , citings , index terms 



10 An efficient boosting algorithm for combining preferences 
Yoav Freund, Raj Iyer, Robert E. Schapire, Yoram Singer 
December 2003 The Journal of Machine Learning Research, volume 4 

Full text available: ^ pdf(392.20 KB) Additional Information: full citation , abstract , index terms 

We study the problem of learning to accurately rank a set of objects by combining a given 
collection of ranking or preference functions. This problem of combining preferences arises 
in several applications, such as that of combining the results of different search engines, or 
the "collaborative-filtering" problem of ranking movies for a user based on the movie 
rankings provided by other users. In this work, we begin by presenting a formal framework 
for this general problem. We then describe and ... 




Im proving SVM accuracy by training on auxiliary data sources 
Pengcheng Wu, Thomas G. Dietterich 

July 2004 Twenty-first international conference on Machine learning 

Full text available: ^ pdf (263.65 KB) Additional Information: full citation , abstract , references 

The standard model of supervised learning assumes that training and test data are drawn 
from the same underlying distribution. This paper explores an application in which a second, 
auxiliary, source of data is available drawn from a different distribution. This auxiliary data is 
more plentiful, but of significantly lower quality, than the training and test data. In the SVM 
framework, a training example has two roles: (a) as a data point to constrain the learning 
process and (b) as a candidate su ... 
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12 Poster papers: Tumor cell identification using features rules 
Bin Fang, Wynne Hsu, Mong Li Lee 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available- 1jS| pdf(152 89 KB) Additional Information: full citation , abstract , references , citings , index 

Advances in imaging techniques have led to large repositories of images. There is an 
increasing demand for automated systems that can analyze complex medical images and 
extract meaningful information for mining patterns. Here, we describe a real-life image 
mining application to the problem of tumour cell counting. The quantitative analysis of 
tumour cells is fundamental to characterizing the activity of tumour cells. Existing 
approaches are mostly manual, time-consuming and subjective. Efforts t ... 

Keywords: dynamic water immersion, features rules, identification, local adaptive 
thresholding, majority vote, meta classifier, weighted vote 



13 Trading MIPS and memory for knowledge engineerin g 

Robert H. Creecy, Brij M. Masand, Stephen J. Smith, David L. Waltz 
August 1992 Communications of the ACM, volume 35 issue 8 

Full text available: ^ pdf(7.46 MB) Additional Information: full citation , references , citings , index terms , review 



Keywords: automated system building, case-based reasoning, empirical learning, memory- 
based reasoning, textual database classification 



Three-dimensional medical imaging: algorithms and computer systems 
M. R. Stytz, G. Frieder, O. Frieder 

December 1991 ACM Computing Surveys (CSUR), volume 23 issue 4 

Full text available: pdf(7.38 MB) Additional Information: full citation , references , citings , index terms , review 



Keywords: Computer graphics, medical imaging, surface rendering, three-dimensional 
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15 Face recognition: A literature survey 

W. Zhao, R. Chellappa, P. J. Phillips, A. Rosenfeld 

December 2003 ACM Computing Surveys (CSUR), volume 35 issue 4 

Full text available: ^ pdf(4.28 MB) Additional Information: full citation , abstract , references , index terms 

As one of the most successful applications of image analysis and understanding, face 
recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of commercial 
and law enforcement applications, and the second is the availability of feasible technologies 
after 30 years of research. Even though current machine recognition systems have reached 
a certain level of maturity, their success is ... 

Keywords: Face recognition, person identification 
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Tao Li, Qi Li, Shenghuo Zhu, Mitsunori Ogihara 

December 2002 ACM SIGKDD Explorations Newsletter, volume 4 issue 2 

Full text available: |3| pdf(330.06 KB) Additional Information: full citation , abstract , references , citings 

Recently there has been significant development in the use of wavelet methods in various 
data mining processes. However, there has been written no comprehensive survey available 
on the topic. The goal of this is paper to fill the void. First, the paper presents a high-level 
data-mining framework that reduces the overall process into smaller components. Then 
applications of wavelets for each component are reviewd. The paper concludes by discussing 
the impact of wavelets on data mining research an ... 

17 Model selection via the AUC 
Saharon Rosset 

July 2004 Twenty-first international conference on Machine learning 

Full text available: pdf(237.64 KB) Additional Information: full citation , abstract , references 

We present a statistical analysis of the AUC as an evaluation criterion for classification 
scoring models. First, we consider significance tests for the difference between AUC scores 
of two algorithms on the same test set. We derive exact moments under simplifying 
assumptions and use them to examine approximate practical methods from the literature. 
We then compare AUC to empirical misclassification error when the prediction goal is to 
minimize future error rate. We show that the AUC may ... 

18 A pplications of machine learning and rule induction 
Pat Langley, Herbert A. Simon 

November 1995 Communications of the ACM, volume 38 issue n 

Full text available- tgl pdf(554 28 KB) Add i tional Information: full citation , abstract , references , citings , index 

terms , review 

Machine learning is the study of computational methods for improving performance by 
mechanizing the acquisition of knowledge from experience. Expert performance requires 
much domain-specific knowledge, and knowledge engineering has produced hundreds of AI 
expert systems that are now used regularly in industry. Machine learning aims to provide 
increasing levels of automation in the knowledge engineering process, replacing much time- 
consuming human activity with automatic tec ... 

19 Theory of keyblock-based image retrieval 
April 2002 ACM Transactions on Information Systems (TOIS), Volume 20 Issue 2 

Full text available: 1S)pdf(2.14 MB) Additional Information: full citation , abstract , references , index terms . 
l£j r review 

The success of text-based retrieval motivates us to investigate analogous techniques which 
can support the querying and browsing of image data. However, images differ significantly 
from text both syntactically and semantically in their mode of representing and expressing 
information. Thus, the generalization of information retrieval from the text domain to the 
image domain is non-trivial. This paper presents a framework for information retrieval in the 
image domain which supports content-based q ... 

Keywords: clustering, codebook, content-based image retrieval, keyblock 
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July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
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This paper explores the use of hierarchical structure for classifying a large, heterogeneous 
collection of web content. The hierarchical structure is initially used to train different second- 
level classifiers. In the hierarchical case, a model is learned to distinguish a second-level 
category from other categories within the same top level. In the flat non-hierarchical case, a 
model distinguishes a second-level category from all other second-level categories. Scoring 
rules can further take ad ... 

Keywords: Web hierarchies, classification, hierarchical models, machine learning, support 
vector machines, text catergorization, text classification 
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Machine learning in automated text categorization 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue l 

Additional Information: full citation , abstract, references , citin gs, index 
terms 



Full text available: pdf(524.41 KB) 



The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of 
documents in digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem is based on machine learning techniques: 
a general inductive process automatically builds a classifier by learning, from a set of 
preclassified documents, the characteristics of the categories. ... 



Keywords: Machine learning, text categorization, text classification 
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3 Trading MIPS and memory for knowledge engineering 

Robert H. Creecy, Brij M. Masand, Stephen J. Smith, David L. Waltz 
August 1992 Communications of the ACM, Volume 35 issue 8 

Full text available: pdf(7.46 MB) Additional Information: full citation , references , citings, index terms , review 
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Full text available: ||] pdf(392.20 KB) Additional Information: full citation , abstract , index terms 

We study the problem of learning to accurately rank a set of objects by combining a given 
collection of ranking or preference functions. This problem of combining preferences arises 
in several applications, such as that of combining the results of different search engines, or 
the "collaborative-filtering" problem of ranking movies for a user based on the movie 
rankings provided by other users. In this work, we begin by presenting a formal framework 
for this general problem. We then describe and ... 

5 Face recognition: A literature survey 

W. Zhao, R. Chellappa, P. J. Phillips, A. Rosenfeld 
December 2003 ACM Computing Surveys (CSUR), volume 35 issue 4 

Full text available: ^ pdf (4.28 MB) Additional Information: full citation , abstract , references , index terms 

As one of the most successful applications of image analysis and understanding, face 
recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of commercial 
and law enforcement applications, and the second is the availability of feasible technologies 
after 30 years of research. Even though current machine recognition systems have reached 
a certain level of maturity, their success is ... 

Keywords: Face recognition, person identification 



6 Applications of machine learning and rule induction 
Pat Langley, Herbert A. Simon 

November 1995 Communications of the ACM, volume 38 issue n 

Full text available: "B Ddf(554.28 KB) Additional Information: full citation, abstract, references, citings, index 

terms , review 

Machine learning is the study of computational methods for improving performance by 
mechanizing the acquisition of knowledge from experience. Expert performance requires 
much domain-specific knowledge, and knowledge engineering has produced hundreds of AI 
expert systems that are now used regularly in industry. Machine learning aims to provide 
increasing levels of automation in the knowledge engineering process, replacing much time- 
consuming human activity with automatic tec ... 

7 The Hearsay-ll Speech-Understanding System: Integrating Knowledge to Resolve 
Uncertainty 

Lee D. Erman, Frederick Hayes-Roth, Victor R. Lesser, D. Raj Reddy 
June 1980 ACM Computing Surveys (CSUR), volume 12 issue 2 

Full text available: ^ pdf(3.83MB) Additional Information: full citation , references , citings , index terms 






Data clustering: a review 

A. K. Jain, M. N. Murty, P. J. Flynn 

September 1999 ACM Computing Surveys (CSUR), Volume 31 Issue 3 

Full text available: « odf(636.24 KB) Additional Information: full citation, abstract, references , citings, index 
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Clustering is the unsupervised classification of patterns (observations, data items, or feature 
vectors) into groups (clusters). The clustering problem has been addressed in many 
contexts and by researchers in many disciplines; this reflects its broad appeal and 
usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult 
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problem combinatorially, and differences in assumptions and contexts in different 
communities has made the transfer of useful generic co ... 

Keywords: cluster analysis, clustering applications, exploratory data analysis, incremental 
clustering, similarity indices, unsupervised learning 



Fast detection of communication patterns in distributed executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: |S| pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 



10 Model selection via the AUC 
Saharon Rosset 

July 2004 Twenty-first international conference on Machine learning 

Full text available: |g| pdf(237.64 KB) Additional Information: full citation , abstract , references 

We present a statistical analysis of the AUC as an evaluation criterion for classification 
scoring models. First, we consider significance tests for the difference between AUC scores 
of two algorithms on the same test set. We derive exact moments under simplifying 
assumptions and use them to examine approximate practical methods from the literature. 
We then compare AUC to empirical misclassification error when the prediction goal is to 
minimize future error rate. We show that the AUC may ... 



A review of vessel extraction techniques and algorithms 
Cemil Kirbas, Francis Quek 

June 2004 ACM Computing Surveys (CSUR), volume 36 issue 2 

Full text available: |p pdf(8.06 MB) Additional Information: full citation , abstract , references , index terms 

Vessel segmentation algorithms are the critical components of circulatory blood vessel 
analysis systems. We present a survey of vessel extraction techniques and algorithms. We 
put the various vessel extraction approaches and techniques in perspective by means of a 
classification of the existing research. While we have mainly targeted the extraction of blood 
vessels, neurosvascular structure in particular, we have also reviewed some of the 
segmentation methods for the tubular objects that show ... 

Keywords: Magnetic resonance angiography, X-ray angiography, medical imaging, 
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This article reviews the available methods for automated identification of objects in digital 
images. The techniques are classified into groups according to the nature of the 
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computational strategy used. Four classes are proposed: (1) the simplest strategies, which 
work on data appropriate for feature vector classification, (2) methods that match models to 
symbolic data structures for situations involving reliable data and complex models, (3) 
approaches that fit models to the photometry and ... 

Keywords: image understanding, model-based vision, object recognition 



13 Class prediction and discovery using gene expression data 
Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 
April 2000 Proceedings of the fourth annual international conference on 

Computational molecular biology 

Full text available: ^ pdf(858.Q0 KB) Additional Information: full citation , abstract , references , citings 

Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. We 
present a method for classifying samples by computational analysis of gene expression data. 
We consider the classification problem in two parts: class discovery and class prediction. 
Class discovery refers to the process of dividing samples into reproducible classes that have 
similar behavior or properties, while class prediction places new samples into already known 
classes. We describe ... 

14 Multi Relational Data Mining (MRDM): Multi-relational data mining: an introduction 
Saso Dzeroski 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 
Full text available:^ pdf(1. 71 MB) Additional Information: full citation , abstract , references , citings 

Data mining algorithms look for patterns in data. While most existing data mining 
approaches look for patterns in a single data table, multi-relational data mining (MRDM) 
approaches look for patterns that involve multiple tables (relations) from a relational 
database. In recent years, the most common types of patterns and approaches considered 
in data mining have been extended to the multi-relational case and MRDM now 
encompasses multi-relational (MR) association rule discovery, MR decision tree ... 
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Grapheme-to-phoneme conversion (GTPC) has been achieved in most European languages 
by dictionary look-up or using rules. The application of these methods, however, in the 
reverse process, (i.e., in phoneme-to-grapheme conversion [PTGC]) creates serious 
problems, especially in inflectionally rich languages. In this paper the PTGC problem is 
approached from a completely different point of view. Instead of rules or a dictionary, the 
statistics of language connecting pronunciation to spelling are ex ... 
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Registration is a fundamental task in image processing used to match two or more pictures 
taken, for example, at different times, from different sensors, or from different viewpoints. 
Virtually all large systems which evaluate images require the registration of images, or a 
closely related operation, as an intermediate step. Specific examples of systems where 
image registration is a significant component include matching a target with a real-time 
image of a scene for target recognition, mon ... 
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Constantly improving gene expression profiling technologies are expected to provide 
understanding and insight into cancer related cellular processes. Gene expression data is 
also expected to significantly and in the development of efficient cancer diagnosis and 
classification platforms. In this work we examine two sets of gene expression data measured 
across sets of tumor and normal clinical samples One set consists of 2,000 genes, measured 
in 62 epithelial colon samples [1]. The second consi ... 
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Many tasks require "reasoning"— i.e., deriving conclusions from a corpus of explicitly stored 
information— to solve their range of problems. An. ideal reasoning system would produce all- 
and-only the correct answers to every possible query, produce answers that are as specific 
as possible, be expressive enough to permit any possible fact to be stored and any possible 
query to be asked, and be (time) efficient 
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Modern relational database systems are beginning to support ad hoc queries on mining 
models. In this article, we explore novel techniques for optimizing queries that contain 
predicates on the results of application of mining models to relational data. For such queries, 
we use the internal structure of the mining model to automatically derive traditional 
database predicates. We present algorithms for deriving such predicates for a large class of 
popular discrete mining models: decision trees, nai ... 
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The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of 
documents in digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem is based on machine learning techniques: 
a general inductive process automatically builds a classifier by learning, from a set of 
preclassified documents, the characteristics of the categories. ... 
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We study the problem of learning to accurately rank a set of objects by combining a given 
collection of ranking or preference functions. This problem of combining preferences arises 
in several applications, such as that of combining the results of different search engines, or 
the "collaborative-filtering" problem of ranking movies for a user based on the movie 
rankings provided by other users. In this work, we begin by presenting a formal framework 
for this general problem. We then describe and ... 
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As one of the most successful applications of image analysis and understanding, face 
recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of commercial 
and law enforcement applications, and the second is the availability of feasible technologies 
after 30 years of research. Even though current machine recognition systems have reached 
a certain level of maturity, their success is ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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Machine learning is the study of computational methods for improving performance by 
mechanizing the acquisition of knowledge from experience. Expert performance requires 
much domain-specific knowledge, and knowledge engineering has produced hundreds of AI 
expert systems that are now used regularly in industry. Machine learning aims to provide 
increasing levels of automation in the knowledge engineering process, replacing much time- 
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Data mining algorithms look for patterns in data. While most existing data mining 
approaches look for patterns in a single data table, multi-relational data mining (MRDM) 
approaches look for patterns that involve multiple tables (relations) from a relational 
database. In recent years, the most common types of patterns and approaches considered 
in data mining have been extended to the multi-relational case and MRDM now 
encompasses multi-relational (MR) association rule discovery, MR decision tree ... 
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We present a statistical analysis of the AUC as an evaluation criterion for classification 
scoring models. First, we consider significance tests for the difference between AUC scores 
of two algorithms on the same test set. We derive exact moments under simplifying 
assumptions and use them to examine approximate practical methods from the literature. 
We then compare AUC to empirical misclassification error when the prediction goal is to 
minimize future error rate. We show that the AUC may ... 
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Grapheme-to-phoneme conversion (GTPC) has been achieved in most European languages 
by dictionary look-up or using rules. The application of these methods, however, in the 
reverse process, (i.e., in phoneme-to-grapheme conversion [PTGC]) creates serious 
problems, especially in inflectionally rich languages. In this paper the PTGC problem is 
approached from a completely different point of view. Instead of rules or a dictionary, the 
statistics of language connecting pronunciation to spelling are ex ... 
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Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. We 
present a method for classifying samples by computational analysis of gene expression data. 
We consider the classification problem in two parts: class discovery and class prediction. 
Class discovery refers to the process of dividing samples into reproducible classes that have 
similar behavior or properties, while class prediction places new samples into already known 
classes. We describe ... 
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Registration is a fundamental task in image processing used to match two or more pictures 
taken, for example, at different times, from different sensors, or from different viewpoints. 
Virtually all large systems which evaluate images require the registration of images, or a 
closely related operation, as an intermediate step. Specific examples of systems where 
image registration is a significant component include matching a target with a real-time 
image of a scene for target recognition, mon ... 
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Many tasks require "reasoning"— i.e., deriving conclusions from a corpus of explicitly stored 
information— to solve their range of problems. An ideal reasoning system would produce all- 
and-only the correct answers to every possible query, produce answers that are as specific 
as possible, be expressive enough to permit any possible fact to be stored and any possible 
query to be asked, and be (time) efficient 
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Modern relational database systems are beginning to support ad hoc queries on mining 
models. In this article, we explore novel techniques for optimizing queries that contain 
predicates on the results of application of mining models to relational data. For such queries, 
we use the internal structure of the mining model to automatically derive traditional 
database predicates. We present algorithms for deriving such predicates for a large class of 
popular discrete mining models: decision trees, nai ... 
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Constantly improving gene expression profiling technologies are expected to provide 
understanding and insight into cancer related cellular processes. Gene expression data is 
also expected to significantly and in the development of efficient cancer diagnosis and 
classification platforms. In this work we examine two sets of gene expression data measured 
across sets of tumor and normal clinical samples One set consists of 2,000 genes, measured 
in 62 epithelial colon samples [1], The second consi ... 
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In a variety of practical situations such as reverse engineering of boundary representation 
from depth maps of scanned objects, range data analysis, model-based recognition and 
algebraic surface design, there is a need to recover the shape of visible surfaces of a dense 
3D point set. In particular, it is desirable to identify and fit simple surfaces of known type 
wherever these are in reasonable agreement with the data. We are interested in the class of 
quadric surfaces, that is, algebraic surfa ... 
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Recommender systems have been evaluated in many, often incomparable, ways. In this 
article, we review the key decisions in evaluating collaborative filtering recommender 
systems: the user tasks being evaluated, the types of analysis and datasets being used, the 
ways in which prediction quality is measured, the evaluation of prediction attributes other 
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As one of the most successful applications of image analysis and understanding, face 
recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of commercial 
and law enforcement applications, and the second is the availability of feasible technologies 
after 30 years of research. Even though current machine recognition systems have reached 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
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Machine learning is the study of computational methods for improving performance by 
mechanizing the acquisition of knowledge from experience. Expert performance requires 
much domain-specific knowledge, and knowledge engineering has produced hundreds of AI 
expert systems that are now used regularly in industry. Machine learning aims to provide 
increasing levels of automation in the knowledge engineering process, replacing much time- 



http://p0rtal.acm.0rg/results.c 2/16/05 



Results (page 1): (((+"medical cost") and (+"training data")) and (+first +second +third +... Page 3 of 6 



consuming human activity with automatic tec ... 



A survey of image registration techniques 
Lisa Gottesfeld Brown 

December 1992 ACM Computing Surveys (CSUR), volume 24 issue 4 

Full text available: fjQ pdf(5.20 MB) Additional Information: full citation , abstract , references , citings, index 
^ terms , review 
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taken, for example, at different times, from different sensors, or from different viewpoints. 
Virtually all large systems which evaluate images require the registration of images, or a 
closely related operation, as an intermediate step. Specific examples of systems where 
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of two algorithms on the same test set. We derive exact moments under simplifying 
assumptions and use them to examine approximate practical methods from the literature. 
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wherever these are in reasonable agreement with the data. We are interested in the class of 
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Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. We 
present a method for classifying samples by computational analysis of gene expression data. 
We consider the classification problem in two parts: class discovery and class prediction. 
Class discovery refers to the process of dividing samples into reproducible classes that have 
similar behavior or properties, while class prediction places new samples into already known 
classes. We describe ... 
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Constantly improving gene expression profiling technologies are expected to provide 
understanding and insight into cancer related cellular processes. Gene expression data is 
also expected to significantly and in the development of efficient cancer diagnosis and 
classification platforms. In this work we examine two sets of gene expression data measured 
across sets of tumor and normal clinical samples One set consists of 2,000 genes, measured 
in 62 epithelial colon samples [1]. The second consi ... 
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Recommender systems have been evaluated in many, often incomparable, ways. In this 
article, we review the key decisions in evaluating collaborative filtering recommender 
systems: the user tasks being evaluated, the types of analysis and datasets being used, the 
ways in which prediction quality is measured, the evaluation of prediction attributes other 
than quality, and the user-based evaluation of the system as a whole. In addition to 
reviewing the evaluation strategies used by prior researchers ... 
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Many software process methods and tools presuppose the existence of a formal model of a 
process. Unfortunately, developing a formal model for an on-going, complex process can be 
difficult, costly, and error prone. This presents a practical barrier to the adoption of process 
technologies, which would be lowered by automated assistance in creating formal models. 
To this end, we have developed a data analysis technique that we term process discovery. 
Under this technique, data ... 
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H 



20 Octrees for faster isosurface generation | 
Jane Wilhelms, Allen Van Gelder 

July 1992 ACM Transactions on Graphics (TOG), volume u issue 3 
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The large size of many volume data sets often prevents visualization algorithms from 
providing interactive rendering. The use of hierarchical data structures can ameliorate this 
problem by storing summary information to prevent useless exploration of regions of little or 
no current interest within the volume. This paper discusses research into the use of the 
octree hierarchical data structure when the regions of current interest can vary during the 
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Full text available: ^ pdf(4.28 MB) Additional Information: full citation , abstract , references , index terms 

As one of the most successful applications of image analysis and understanding, face recognition r 
recently received significant attention, especially during the past several years. At least two reasoi 
account for this trend: the first is the wide range of commercial and law enforcement applications, 
the second is the availability of feasible technologies after 30 years of research. Even though curre 
machine recognition systems have reached a certain level of maturity, their success is ... 
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The automated categorization (or classification) of texts into predefined categories has witnessed . 
booming interest in the last 10 years, due to the increased availability of documents in digital forn 
the ensuing need to organize them. In the research community the dominant approach to this pro 
is based on machine learning techniques: a general inductive process automatically builds a classr 
learning, from a set of preclassified documents, the characteristics of the categories. ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on proce 
time diagrams are often used to obtain a better understanding of the execution of the application, 
visualization tool we use is Poet, an event tracer developed at the University of Waterloo. Howevei 
these diagrams are often very complex and do not provide the user with the desired overview of t 
application. In our experience, such tools display repeated occurrences of non-trivial commun ... 
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We study the problem of learning to accurately rank a set of objects by combining a given collectu 
ranking or preference functions. This problem of combining preferences arises in several applicatic 
such as that of combining the results of different search engines, or the "collaborative-filtering" pr 
of ranking movies for a user based on the movie rankings provided by other users. In this work, v\ 
begin by presenting a formal framework for this general problem. We then describe and ... 
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Machine learning is the study of computational methods for improving performance by mechanizin 
acquisition of knowledge from experience. Expert performance requires much domain-specific 
knowledge, and knowledge engineering has produced hundreds of AI expert systems that are now 
regularly in industry. Machine learning aims to provide increasing levels of automation in the knov\ 
engineering process, replacing much time-consuming human activity with automatic tec ... 
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Registration is a fundamental task in image processing used to match two or more pictures taken, 
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example, at different times, from different sensors, or from different viewpoints. Virtually all large 
systems which evaluate images require the registration of images, or a closely related operation, < 
intermediate step. Specific examples of systems where image registration is a significant compone 
include matching a target with a real-time image of a scene for target recognition, mon ... 
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In a variety of practical situations such as reverse engineering of boundary representation from de 
maps of scanned objects, range data analysis, model-based recognition and algebraic surface desi 
there is a need to recover the shape of visible surfaces of a dense 3D point set. In particular, it is 
desirable to identify and fit simple surfaces of known type wherever these are in reasonable agree 
with the data. We are interested in the class of quadric surfaces, that is, algebraic surfa ... 
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Full text available: ^ pdf(1.71 MB) Additional Information: full citation , abstract , references , citings 

Data mining algorithms look for patterns in data. While most existing data mining approaches loot 
patterns in a single data table, multi-relational data mining (MRDM) approaches look for patterns t 
involve multiple tables (relations) from a relational database. In recent years, the most common t 
of patterns and approaches considered in data mining have been extended to the multi-relational • 
and MRDM now encompasses multi-relational (MR) association rule discovery, MR decision tree ... 

Keywords: inductive logic programming, multi-relational data mining, relational association rules 
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Full text available: ^ pdf(698.37 KB) Additional Information: full citation , abstract , references , index terms 

Modern relational database systems are beginning to support ad hoc queries on mining models. In 
article, we explore novel techniques for optimizing queries that contain predicates on the results o 
application of mining models to relational data. For such queries, we use the internal structure oft 
mining model to automatically derive traditional database predicates. We present algorithms for d< 
such predicates for a large class of popular discrete mining models: decision trees, nai ... 
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Full text available: ^ pdf(445.41 KB) Additional Information: full citation , abstract , references , citings , index terms . 

Many tasks require "reasoning"— i.e., deriving conclusions from a corpus of explicitly stored 
information— to solve their range of problems. An ideal reasoning system would produce all-and-oi 
correct answers to every possible query, produce answers that are as specific as possible, be expr 
enough to permit any possible fact to be stored and any possible query to be asked, and be (time; 
efficient 
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Many software process methods and tools presuppose the existence of a formal model of a proces 
Unfortunately, developing a formal model for an on-going, complex process can be difficult, costly 
error prone. This presents a practical barrier to the adoption of process technologies, which would 
lowered by automated assistance in creating formal models. To this end, we have developed a dat 
analysis technique that we term process discovery. Under this technique, data ... 
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Constantly improving gene expression profiling technologies are expected to provide understandin 
insight into cancer related cellular processes. Gene expression data is also expected to significant!' 
in the development of efficient cancer diagnosis and classification platforms. In this work we exarr 
two sets of gene expression data measured across sets of tumor and normal clinical samples One 
consists of 2,000 genes, measured in 62 epithelial colon samples [1]. The second consi ... 
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Recommender systems have been evaluated in many, often incomparable, ways. In this article, w 
review the key decisions in evaluating collaborative filtering recommender systems: the user tasks 
evaluated, the types of analysis and datasets being used, the ways in which prediction quality is 
measured, the evaluation of prediction attributes other than quality, and the user-based evaluatio 
the system as a whole. In addition to reviewing the evaluation strategies used by prior researcher 
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method for classifying samples by computational analysis of gene expression data. We consider th 
classification problem in two parts: class discovery and class prediction. Class discovery refers to t 
process of dividing samples into reproducible classes that have similar behavior or properties, whil 
prediction places new samples into already known classes. We describe ... 
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Database management systems will continue to manage large data volumes. Thus, efficient algori 
for accessing and manipulating large sets and sequences will be required to provide acceptable 
performance. The advent of object-oriented and extensible database systems will not solve this pr 
On the contrary, modern data models exacerbate the problem: In order to manipulate large sets c 
complex objects as efficiently as today's database systems manipulate simple records, query-proc 
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Probabilistic, or randomized, algorithms are fast becoming as commonplace as conventional 
deterministic algorithms. This survey presents five techniques that have been widely used in the d 
of randomized algorithms. These techniques are illustrated using 12 randomized algorithms— both 
sequential and distributed— that span a wide range of applications, including:primality testing (a 
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